Measuring the Structural Similarity between Source Code Entities (S)

نویسندگان

Ricardo Terra

João Brunet

Luis Fernando Miranda

Marco Tulio Valente

Dalton Serey Guerrero

Douglas Castilho

Roberto da Silva Bigonha

چکیده

Similarity coefficients are widely used in software engineering for several purposes, such as identification of refactoring opportunities and system remodularizations. Although the literature provides several similarity coefficients that vary on the computing strategy, there is a tendency among researchers to make habitual use of certain coefficients that others in their field are using. Consequently, some approaches might be using an inadequate coefficient for their purpose. In this paper, we conduct a quantitative study that compares 18 coefficients to identify which one is the most appropriate in determining where a class should be located. Our evaluation contemplates 111 open source systems from Qualitas Corpus, which totalizes more than 70,000 classes. As a result, we observed that Jaccard—one of the most used coefficients in our area—has not presented the best results. While Jaccard correctly indicated the suitable module to 22% of the classes, other coefficients were able to indicate 60%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hapax - Enriching Reverse Engineering with Semantic Clustering

Many reverse engineering approaches focus on structural information and ignore semantic information like the naming of identifiers or comments. But developers put their domain knowledge into exactly these parts of the source code. Without understanding the semantics of the code, one cannot tell its meaning. We use Latent Semantic Indexing, an information retrieval technique [3], to retrieve the...

متن کامل

A Source Code Similarity System for Plagiarism Detection

Source code plagiarism is an easy to do task, but very difficult to detect without proper tool support. Various source code similarity detection systems have been developed to help detect source code plagiarism. Those systems need to recognize a number of lexical and structural source code modifications. For example, by some structural modifications (e.g. modification of control structures, mod...

متن کامل

Measuring Semantic Similarity using a Multi-Tree Model

Recommender systems and search engines are examples of systems that have used techniques such as Pearson’s product-momentum correlation coefficient or Cosine similarity for measuring semantic similarity between two entities. These methods relinquish semantic relations between pairs of features in the vector representation of an entity. This paper describes a new technique for calculating semant...

متن کامل

Alignment-free local structural search by writhe decomposition

MOTIVATION Rapid methods for protein structure search enable biological discoveries based on flexibly defined structural similarity, unleashing the power of the ever greater number of solved protein structures. Projection methods show promise for the development of fast structural database search solutions. Projection methods map a structure to a point in a high-dimensional space and compare tw...

متن کامل

Measuring Similarity of Large Software Systems Based on Source Code Correspondence

It is an important and intriguing issue to know the quantitative similarity of large software systems. In this paper, a similarity metric between two sets of source code files based on the correspondence of overall source code lines is proposed. A Software similarity MeAsurement Tool SMAT was developed and applied to various versions of an operating system(BSD UNIX OS). The resulting similarity...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Measuring the Structural Similarity between Source Code Entities (S)

نویسندگان

چکیده

منابع مشابه

Hapax - Enriching Reverse Engineering with Semantic Clustering

A Source Code Similarity System for Plagiarism Detection

Measuring Semantic Similarity using a Multi-Tree Model

Alignment-free local structural search by writhe decomposition

Measuring Similarity of Large Software Systems Based on Source Code Correspondence

عنوان ژورنال:

اشتراک گذاری